Stata

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Stata
Original authorWilliam Gould[1]
DeveloperStataCorp
Initial release1985 (1985)
Stable release
19.0 / April 8, 2025; 13 months ago (2025-04-08)
Repository
  • {{URL|example.com|optional display text}}Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
Written inC
Engine
    Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
    Operating systemWindows, macOS, Linux
    TypeStatistical analysis
    Numerical analysis
    LicenseProprietary
    Websitewww.stata.com

    Stata (/ˈsttə/,[2] STAY-ta, alternatively /ˈstætə/, occasionally stylized as STATA[3][4]) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fields, including biomedicine, economics, epidemiology, and sociology.[5]

    Stata was initially developed by Computing Resource Center in California and the first version was released in 1985.[6] In 1993, the company moved to College Station, Texas and was renamed Stata Corporation, now known as StataCorp.[1] A major release in 2003 included a new graphics system and dialog boxes for all commands.[6] Since then, a new version has been released once every two years.[7] The current version is Stata 19, released in April 2025.[8]

    Technical overview and terminology

    [edit | edit source]

    User interface

    [edit | edit source]

    From its creation, Stata has always employed an integrated command-line interface. Starting with version 8.0, Stata has included a graphical user interface which uses menus and dialog boxes to give access to many built-in commands. The dataset can be viewed or edited in spreadsheet format. From version 11 on, other commands can be executed while the data browser or editor is opened.

    Data structure and storage

    [edit | edit source]

    Until the release of version 16,[9] Stata could only open a single dataset at any one time. Stata allows for flexibility with assigning data types to data. Its compress command automatically reassigns data to data types that take up less memory without loss of information. Stata utilizes integer storage types which occupy only one or two bytes rather than four, and single-precision (4 bytes) rather than double-precision (8 bytes) is the default for floating-point numbers.

    Stata's proprietary output language is known as SMCL, which stands for Stata Markup and Control Language and is pronounced "smickle".[10]

    Stata's data format is always tabular in format. Stata refers to the columns of tabular data as variables.

    Data format compatibility

    [edit | edit source]

    Stata can import data in a variety of formats. This includes ASCII data formats (such as CSV or databank formats) and spreadsheet formats (including various Excel formats).

    Stata's proprietary file formats have changed over time, although not every Stata release includes a new dataset format. Every version of Stata can read all older dataset formats, and can write both the current and most recent previous dataset format, using the saveold command.[11] Thus, the current Stata release can always open datasets that were created with older versions, but older versions cannot read newer format datasets.

    Stata can read and write SAS XPORT format datasets natively, using the fdause and fdasave commands.

    Some other econometric applications, including gretl, can directly import Stata file formats.

    History

    [edit | edit source]

    The development of Stata began in 1984, initially by William (Bill) Gould and later by Sean Becketti. The software was intended to compete with statistical programs for personal computers such as SYSTAT and MicroTSP.[6] Written in the C programming language, Stata was released for MS-DOS in 1985 with 44 commands.[6] Since then, versions of Stata have been released for systems running Unix variants like Linux distributions, Windows, and MacOS.[6] All Stata files are platform-independent.

    Commands in Stata 1.0 and Stata 1.1
    append dir infile plot spool
    beep do input query summarize
    by drop label regress tabulate
    capture erase list rename test
    confirm exit macro replace type
    convert expand merge run use
    correlate format modify save
    count generate more set
    describe help outfile sort

    There have been 19 major releases of Stata between 1985 and 2025 and additional code and documentation updates between major releases.[7] In its early years, extra sets of Stata programs were sometimes sold as "kits" or distributed as Support Disks. With the release of Stata 6 in 1999, updates began to be delivered to users via the web.[6]

    Hundreds of commands have been added to Stata in its 37-year history.[12][13] Certain developments have proved to be particularly important and continue to shape the user experience today, including extensibility, platform independence, and the active user community.[6]

    Extensibility

    [edit | edit source]

    The program command was implemented in Stata 1.2, giving users the ability to add their own commands.[6][14] ado-files followed in Stata 2.1, allowing a user-written program to be automatically loaded into memory. Many user-written ado-files are submitted to the Statistical Software Components Archive hosted by Boston College. StataCorp added an ssc command to allow community-contributed programs to be added directly within Stata.[15] More recent editions of Stata allow users to call Python scripts using commands, as well as allowing Python IDEs like Jupyter Notebooks to import Stata commands.[16] Although Stata does not support R natively, there are user-written extensions to use R scripts in Stata.[17]

    User community

    [edit | edit source]

    A number of important developments were initiated by Stata's active user community.[6] The Stata Technical Bulletin, which often contains user-created commands, was introduced in 1991 and issued six times a year. It was relaunched in 2001 as the peer-reviewed Stata Journal, a quarterly publication containing descriptions of community-contributed commands and tips for the effective use of Stata. In 1994, a listserv began as a hub for users to collaboratively solve coding and technical issues; in 2014, it was converted into a web forum. In 1995, Statacorp began organizing user and developer conferences that meet annually. Only the annual Stata Conference held in the United States is hosted by StataCorp. Other user group meetings are held annually in the United States (the Stata Conference), the UK, Germany, and Italy, and less frequently in several other countries. Local Stata distributors host User Group meetings in their own countries.

    Table: Releases and Development of Stata
    Version Release date Select new or enhanced features
    1.0 January 1985
    • Initial release
    • Forty-four commands
    1.1 February 1985
    • Bug fixes
    1.2 May 1985
    • New menu system
    • Better online help
    • keep
    1.3 August 1985
    • Stata/Graphics
    • program
    1.4 August 1986
    • New documentation
    • Formatted infile
    1.5 February 1987
    • anova
    • logit, probit
    2.0 June 1988
    • New graphics
    • String variables
    • Survival analysis: Cox and Kaplan-Meier
    • Stepwise regression
    2.1 September 1990
    • Byte variables
    • Factor analysis
    • ado-files
    • reshape
    3.0 March 1992
    • logistic, ologit, oprobit, clogit, mlogit
    • tobit, cnreg, rreg, qreg, weibull, ereg
    • epitab
    • pweights
    3.1 August 1993
    • mvreg, sureg, heckman, nlreg, areg, canon
    • nbreg
    • constrained linear regression
    • ml
    • codebook
    4.0 January 1995
    • xtreg
    • glm
    5.0 October 1996
    • xtgee, xtprobit
    • prais, newey, intreg
    • survey estimation commands
    • fracpoly
    • st extended
    6.0 January 1999
    • web aware
    • new ml
    • time-series operators
    • arima, arch
    • st rewritten
    7.0 December 2000
    • frailty
    • xtabond
    • cluster analysis
    • nlogit
    • roc
    • SMCL
    8.0 January 2003
    • graphics
    • extended GUI, dialog boxes available for all commands
    • manova
    • more survey
    • more time series (VARs, SVARs)
    • more GLLAMM internalization
    8.1 July 2003
    • updated ml
    8.2 October 2003
    • graphics changes
    9.0 April 2005
    • mata matrix programming language
    • survey features
    • linear mixed models
    • multinominal probit models
    9.1 September 2005
    9.2 April 2006
    10.0 June 2007
    • graph editor
    • logistic and Poisson models with complex, nested error components
    10.1 August 2008
    11.0 July 2009
    • factor variables
    • margins postestimation command
    • multiple imputation
    11.1 June 2010
    11.2 March 2011
    12.0 July 2011
    • automatic memory management
    • structural equation modeling
    12.1 January 2012
    13.0 June 2013
    • long strings
    • treatment effects
    13.1 October 2013
    14.0 April 2015
    • unicode support
    • Bayesian statistical analysis
    14.1 October 2015
    14.2 September 2016
    15.0 June 2017
    • latent class analysis
    • PDF and Word documents
    • color transparency or opacity in graphs
    15.1 November 2017
    16.0 June 2019
    • frames (multiple datasets in memory)
    • lasso regression
    • automated reporting
    • updated choice models
    16.1 February 2020
    17.0 April 2021
    • updated tables command
    • bayesian econometrics
    18.0 April 2023
    • Bayesian model averaging
    • causal mediation analysis
    • heterogeneous difference-in-differences

    Software products

    [edit | edit source]

    There are four builds of Stata: Stata/MP, Stata/SE, Stata/BE, and Numerics by Stata.[18] Whereas Stata/MP allows for built-in parallel processing of certain commands, Stata/SE and Stata/BE are bottlenecked and limit usage to only one single core.[19] Stata/MP runs certain commands about 2.4 times faster, roughly 60% of theoretical maximum efficiency, when running parallel processes on four CPU cores compared to SE or BE versions.[19] Numerics by Stata allows for web integration of Stata commands.

    SE and BE versions differ in the amount of memory datasets may utilize. Though Stata/MP can store 10 to 20 billion observations and up to 120,000 variables, Stata/SE and Stata/BE store up to 2.14 billion observations and handle 32,767 variables and 2,048 variables respectively. The maximum number of independent variables in a model is 65,532 variables in Stata/MP, 10,998 variables in Stata/SE, and 798 variables in Stata/BE.[18]

    The pricing and licensing of Stata depends on its intended use: business, government/nonprofit, education, or student. Single user licenses are either renewable annually or perpetual. Other license types include a single license for use by concurrent users, a site license, volume single user for bulk pricing, or a student lab.[20]

    Example code

    [edit | edit source]

    The following set of commands revolve around simple data management.[21]

    sysuse auto                 // Open the included auto dataset
    browse                      // Browse the dataset (opens the Data Editor window)
    
    describe                    // Describes the dataset and associated variables
    summarize                   // Summary information about numerical variables
    
    codebook make foreign       // Summary information about the make (string) and foreign (numeric) variables
    
    browse if missing(rep78)    // Browse only observations with missing data for variable rep78
    list make if missing(rep78) // List makes of the cars with missing data for variable rep78
    

    The next set of commands move onto descriptive statistics.

    summarize price, detail          // Detailed summary statistics for variable price
    
    tabulate foreign                 // One-way frequency table for variable foreign
    tabulate rep78 foreign, row      // Two-way frequency table for variables rep78 and foreign
    
    summarize mpg if foreign == 1    // Summary information about mpg if the car is foreign (the "==" sign tests for equality)
    by foreign, sort: summarize mpg  // As above, but using the "by" prefix.
    tabulate foreign, summarize(mpg) // As above, but using the tabulate command.
    

    A simple hypothesis test:

    ttest mpg, by(foreign) // T-test for difference in means for domestic vs. foreign cars
    

    Graphing data:

    twoway (scatter mpg weight)                     // Scatter plot showing relationship between mpg and weight
    twoway (scatter mpg weight), by(foreign, total) // Three graphs for domestic, foreign, and all cars
    

    Linear regression:

    generate wtsq = weight^2                      // Create a new variable for weight squared
    regress mpg weight wtsq foreign, vce(robust)  // Linear regression of mpg on weight, wtsq, and foreign
    predict mpghat                                // Create a new variable contained the predicted values of mpg
    twoway (scatter mpg weight) (line mpghat weight, sort), by(foreign) // Graph data and fitted line
    
    Regression graphs from auto dataset in Stata 17

    See also

    [edit | edit source]

    References

    [edit | edit source]
    1. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    6. ^ a b c d e f g h i Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    7. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    8. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    9. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    10. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    11. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    12. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    13. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    14. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    15. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    16. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    17. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    18. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    19. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    20. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    21. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

    Further reading

    [edit | edit source]
    • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    • Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    [edit | edit source]

    Lua error in Module:Sister_project_links at line 396: attempt to index field 'wikibase' (a nil value).

    Lua error in Module:Authority_control at line 153: attempt to index field 'wikibase' (a nil value).